set.seed(1)
x <- 1:10
y <- 2 + 3 * x + rnorm(10, 0, 2)
X <- cbind(1, x)
beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
model <- lm(y ~ x)
cbind(beta_hat, coef(model)) [,1] [,2]
1.662353 1.662353
x 3.109464 3.109464
Recall our objective: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
Goal: Find \(\hat{\boldsymbol{\beta}}\) that minimizes SSE
Method: Take the derivative with respect to \(\boldsymbol{\beta}\) and set equal to zero
Start with: \[\text{SSE} = (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})\]
Expand: \[\text{SSE} = \mathbf{y}^T\mathbf{y} - \mathbf{y}^T\mathbf{X}\boldsymbol{\beta} - \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Since \(\mathbf{y}^T\mathbf{X}\boldsymbol{\beta} = \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y}\) (both are scalars): \[\text{SSE} = \mathbf{y}^T\mathbf{y} - 2\boldsymbol{\beta}^T\mathbf{X}^T\mathbf{y} + \boldsymbol{\beta}^T\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Matrix calculus rules we need:
Taking the derivative: \[\frac{\partial \text{SSE}}{\partial \boldsymbol{\beta}} = -2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta}\]
Set the derivative equal to zero: \[-2\mathbf{X}^T\mathbf{y} + 2\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]
Divide by 2: \[-\mathbf{X}^T\mathbf{y} + \mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{0}\]
Rearrange: \[\mathbf{X}^T\mathbf{X}\boldsymbol{\beta} = \mathbf{X}^T\mathbf{y}\]
Solve for \(\hat{\boldsymbol{\beta}}\): \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]
Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]
Where:
Any matrix \(\mathbf{X}\) can be decomposed as: \[\mathbf{X} = \mathbf{Q}\mathbf{R}\]
In words: Break down \(\mathbf{X}\) into perpendicular directions (\(\mathbf{Q}\)) and scaling/rotation (\(\mathbf{R}\))
Traditional approach requires: \[\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]
Problems with \((\mathbf{X}^T\mathbf{X})^{-1}\):
What do I mean by “unstable”
Error in solve.default(crossprod(X)) : system is computationally singular: reciprocal condition number = 3.11914e-17
If \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), then: \[\mathbf{X}^T\mathbf{X} = (\mathbf{Q}\mathbf{R})^T(\mathbf{Q}\mathbf{R}) = \mathbf{R}^T\mathbf{Q}^T\mathbf{Q}\mathbf{R} = \mathbf{R}^T\mathbf{R}\]
So: \[ \begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{R}^T\mathbf{R})^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y} \end{align} \] ::: fragment Note that \((\mathbf{AB})^{-1}=\mathbf{B}^{-1}\mathbf{A}^{-1}\) as long as both \(\mathbf{A}\) and \(\mathbf{B}\) are invertible (so if \(\mathbf{R}\) is invertible then \((\mathbf{R}^T\mathbf{R})^{-1} = \mathbf{R}^{-1}(\mathbf{R}^T)^{-1}\)) :::
So: \[ \begin{align} \hat{\boldsymbol{\beta}} &= (\mathbf{R}^T\mathbf{R})^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y} \\&= \mathbf{R}^{-1}(\mathbf{R}^T)^{-1}\mathbf{R}^T\mathbf{Q}^T\mathbf{y}\\& = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{y} \end{align} \]
Given: \[\mathbf{R} = \begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix}, \quad \mathbf{Q}^T\mathbf{y} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Solve: \(\mathbf{R}\hat{\boldsymbol{\beta}} = \mathbf{Q}^T\mathbf{y}\)
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 1: Solve bottom row first \[5\hat\beta_3 = 15 \implies \hat\beta_3 = 3\]
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 2: Substitute into second row \[4\beta_2 + 2(3) = 12 \implies 4\beta_2 = 6 \implies \beta_2 = 1.5\]
\[\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 2 \\ 0 & 0 & 5 \end{bmatrix} \begin{bmatrix} \hat\beta_1 \\ \hat\beta_2 \\ \hat\beta_3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \\ 15 \end{bmatrix}\]
Step 3: Substitute into first row \[2\beta_1 + 1(1.5) + 3(3) = 8 \implies 2\beta_1 = -2.5 \implies \beta_1 = -1.25\]
\[\hat{\boldsymbol{\beta}} = \begin{bmatrix} -1.25 \\ 1.5 \\ 3 \end{bmatrix}\]
Key insight: Work backwards through the triangular matrix, using previously solved values to eliminate unknowns in each step.
Given \(\mathbf{X} = \mathbf{Q}\mathbf{R}\), verify that:
Hint: Use the fact that \(\mathbf{Q}^T\mathbf{Q} = \mathbf{I}\) and remember \((\mathbf{AB})^{-1}=\mathbf{B}^{-1}\mathbf{A}^{-1}\) as long as both \(\mathbf{A}\) and \(\mathbf{B}\) are invertible.
05:00
Condition number comparison:
Calculate \(\hat{y}\) using the QR elements.
05:00